Normalization of Digital Mathematics Library Content MathML Canonicalization
نویسندگان
چکیده
Paper discusses the needs for data normalization in a Digital Mathematics Library (DML). Specifically, emphasis is given to canonicalizing formulae encoded in Presentation MathML notation which starts to be available in several DMLs and is used by DML applications. This is a prerequisite for advanced processing—namely math enabled fulltext searching or semantic filtering and automated classification. Different sources of MathML and their specifics are described. Several use cases of possible formulae canonicalization transformations are listed and discussed in detail. Findings are finally concluded and a design of a to-be-developed canonicalization tool is outlined.
منابع مشابه
Math Indexer and Searcher Web Interface - Towards Fulfillment of Mathematicians' Information Needs
We are designing and developing a web user interface for digital mathematics libraries called WebMIaS. It allows queries to be expressed by mathematicians through a faceted search interface. Users can combine standard textual autocompleted keywords with keywords in the form of mathematical formulae in LATEX or MathML formats. Formulae are shown rendered by the web browser on-the-fly for users’ ...
متن کاملIndexing and Searching Mathematics in Digital Libraries
This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware...
متن کاملA Family of Modular XML Schemas for MathML
MathML is a complex XML application that can, in fact, benefit from a schema definition. One problem in defining such a schema is to develop an architecture that captures the logical structure of MathML. The MathML definition provides two sorts of markup, presentation markup which captures the notational aspects of mathematics, and content markup which captures the meaning of mathematical expre...
متن کاملExtracting Mathematical Semantics from LaTeX Documents
We report on a project to use SGLR parsing and term rewriting with ELAN4 to extract the semantics of mathematical formulas from a LaTeX document and representing them in MathML. The LaTeX document we used is part of the Digital Library of Mathematical Functions (DLMF) project of the US National Institute of Standards and Technology (NIST) and obeys projectspecific conventions, which contains ma...
متن کاملIntegrating Resource Metadata and Domain Markup in an NSDL Collection
Resource level metadata markup alone cannot describe the rich, granular, associative and recombinant information objects potentially contained in modern digital libraries. Today, powerful mechanisms for content and structure description of documents exists in the form of domain specific markup languages such as MatML and MathML. Mechanisms for integrating resource level markup with domain speci...
متن کامل